{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide"
    ]
   },
   "outputs": [],
   "source": [
    "import seaborn as sns\n",
    "sns.set_theme(style=\"white\")"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "Assign a variable to ``x`` to plot a univariate distribution along the x axis:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "penguins = sns.load_dataset(\"penguins\")\n",
    "sns.histplot(data=penguins, x=\"flipper_length_mm\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Flip the plot by assigning the data variable to the y axis:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(data=penguins, y=\"flipper_length_mm\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check how well the histogram represents the data by specifying a different bin width:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(data=penguins, x=\"flipper_length_mm\", binwidth=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also define the total number of bins to use:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(data=penguins, x=\"flipper_length_mm\", bins=30)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Add a kernel density estimate to smooth the histogram, providing complementary information about the shape of the distribution:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(data=penguins, x=\"flipper_length_mm\", kde=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If neither `x` nor `y` is assigned, the dataset is treated as wide-form, and a histogram is drawn for each numeric column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(data=penguins)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can otherwise draw multiple histograms from a long-form dataset with hue mapping:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(data=penguins, x=\"flipper_length_mm\", hue=\"species\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The default approach to plotting multiple distributions is to \"layer\" them, but you can also \"stack\" them:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(data=penguins, x=\"flipper_length_mm\", hue=\"species\", multiple=\"stack\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overlapping bars can be hard to visually resolve. A different approach would be to draw a step function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(penguins, x=\"flipper_length_mm\", hue=\"species\", element=\"step\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can move even farther away from bars by drawing a polygon with vertices in the center of each bin. This may make it easier to see the shape of the distribution, but use with caution: it will be less obvious to your audience that they are looking at a histogram:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(penguins, x=\"flipper_length_mm\", hue=\"species\", element=\"poly\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To compare the distribution of subsets that differ substantially in size, use independent density normalization:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(\n",
    "    penguins, x=\"bill_length_mm\", hue=\"island\", element=\"step\",\n",
    "    stat=\"density\", common_norm=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's also possible to normalize so that each bar's height shows a probability, proportion, or percent, which make more sense for discrete variables:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tips = sns.load_dataset(\"tips\")\n",
    "sns.histplot(data=tips, x=\"size\", stat=\"percent\", discrete=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can even draw a histogram over categorical variables (although this is an experimental feature):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(data=tips, x=\"day\", shrink=.8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When using a ``hue`` semantic with discrete data, it can make sense to \"dodge\" the levels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(data=tips, x=\"day\", hue=\"sex\", multiple=\"dodge\", shrink=.8)"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "Real-world data is often skewed. For heavily skewed distributions, it's better to define the bins in log space. Compare:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "planets = sns.load_dataset(\"planets\")\n",
    "sns.histplot(data=planets, x=\"distance\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To the log-scale version:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(data=planets, x=\"distance\", log_scale=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are also a number of options for how the histogram appears. You can show unfilled bars:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(data=planets, x=\"distance\", log_scale=True, fill=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or an unfilled step function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(data=planets, x=\"distance\", log_scale=True, element=\"step\", fill=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Step functions, especially when unfilled, make it easy to compare cumulative histograms:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(\n",
    "    data=planets, x=\"distance\", hue=\"method\",\n",
    "    hue_order=[\"Radial Velocity\", \"Transit\"],\n",
    "    log_scale=True, element=\"step\", fill=False,\n",
    "    cumulative=True, stat=\"density\", common_norm=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When both ``x`` and ``y`` are assigned, a bivariate histogram is computed and shown as a heatmap:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(penguins, x=\"bill_depth_mm\", y=\"body_mass_g\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's possible to assign a ``hue`` variable too, although this will not work well if data from the different levels have substantial overlap:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(penguins, x=\"bill_depth_mm\", y=\"body_mass_g\", hue=\"species\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Multiple color maps can make sense when one of the variables is discrete:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(\n",
    "    penguins, x=\"bill_depth_mm\", y=\"species\", hue=\"species\", legend=False\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The bivariate histogram accepts all of the same options for computation as its univariate counterpart, using tuples to parametrize ``x`` and ``y`` independently:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(\n",
    "    planets, x=\"year\", y=\"distance\",\n",
    "    bins=30, discrete=(True, False), log_scale=(False, True),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The default behavior makes cells with no observations transparent, although this can be disabled: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(\n",
    "    planets, x=\"year\", y=\"distance\",\n",
    "    bins=30, discrete=(True, False), log_scale=(False, True),\n",
    "    thresh=None,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's also possible to set the threshold and colormap saturation point in terms of the proportion of cumulative counts:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(\n",
    "    planets, x=\"year\", y=\"distance\",\n",
    "    bins=30, discrete=(True, False), log_scale=(False, True),\n",
    "    pthresh=.05, pmax=.9,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To annotate the colormap, add a colorbar:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.histplot(\n",
    "    planets, x=\"year\", y=\"distance\",\n",
    "    bins=30, discrete=(True, False), log_scale=(False, True),\n",
    "    cbar=True, cbar_kws=dict(shrink=.75),\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "py310",
   "language": "python",
   "name": "py310"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
